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Method of authenticating digital data works 


Field of the invention 

5 

This invention relates to a method of authenticating digital data works, particularly to 
enable any unauthorised alterations to that work to be rendered readily detectable. 

10 Description of the Prior Art 

Systems based upon digital data are becoming universal and indispensable; digital data 
passing between computers; digital telecommunications; digital audio; digital cameras; 
and the convergence of many of these individual components into multi-media, are a 
15 selection of the technologies to which this invention relates. 

There are many applications for a technique that can enable any unauthorised alterations 
to that work to be rendered readily detectable. For example, it is today very easy to 
tamper with a digital photograph, rendering the authenticity of any digital photograph 

20 questionable. That will have serious implications for the use of digital photographic 
evidence in criminal litigation, for example. It may therefore be advantageous to be able 
to assert that the integrity of any given digital photograph can be assured. Similarly, it 
is becoming common to archive documents, including legal contracts and financial 
instruments, by imaging and storage on non-erasable digital media, such as WORM. 

25 There is a pressing need to ensure that those digital records are tamper evident. There 
are similar issues pertaining to digital audio and video. For example, where digitally 
recorded speech is to be used in evidence, typically to confirm the existence and terms 
of an oral agreement, then validation of the integrity of the recording is particulariy 
helpful. 

30 

There are various established approaches to ensuring data integrity in the 
telecommunications and digital audio field; for example, the use of error correction 
techniques relying upon check-sums. However, these techniques are designed to 
ensure that a digital signal generated by a device, for example, a CD player, is an 
35 accurate transmission or reproduction firom a source, for example the data stored within 
the CD. That is different from being able to detect if, witiiout access to the original CD, 
any duplicate made of the CD is a completely accurate reproduction of the original CD. 
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The present invention is directed to solving this latter problem. Hence, the present 
invention is not directed to manipulating small units of digital data to enable that data to 
carry information inherent to the proper comprehension of the digital data itself but 
instead to modifying various sub-sets within a digital data work in such a way that any 
5 subsequent unauthorised modification or alteration to any of those sub-sets is readily 
detectable. 

Currently, it is possible to include a simple identifier in the header of the data file of a 
digital data work. The header might typically comprise a checksum derived from the 
10 contents of the data file so that any alteration of the contents inevitably leads to a 
mismatch with the checksum; the mismatch can readily be detected, enabling the 
alteration to be detected. However, it can be relatively easy to strip out the header 
checksum entirely, in which case one could not establish the integrity of the data. 

15 Statement of the invention 

In accordance with the present invention, a method of authenticating a digital data 
work, to enable any unauthorised alteration to that work to be readily detectable, 
comprises the steps of dividing the whole or part of the work into sets of data elements 

20 with each data element or set of data elements having a measurable characteristic, 
selecting a particular set of data elements and then modifying a pre-determined sub-set 
of the data elements in that set so that the measurable characteristic of that pre- 
determined sub-set, or the data elements in that sub-set, satisfies a predetermined 
relation, the measurable characteristic of any sub-set, or the data elements in that sub- 

25 set, being measurably changed if any data element of that sub-set is altered. 

Hence, the essence of the invention is to enable the integrity or authenticity of a digital 
data work to be established by changing the digital data work according to a particular 
algorithm so that some or all of its constituent parts possess a measurable characteristic; 
30 that measurable characteristic is changed if any alterations to the digital data work are 
subsequently carried out. That change can then be detected using a detection process. 

By way of example, if the digital data work is a rectangular photographic image, that 
work might be subdivided into smaller rectangles; for a particular such smaller 
35 rectangle, a sub-set of the data elements making up that smaller rectangle could be 
selected according to a rule derived from a private key. Some of these selected elements 
might then be modified so that a particular relationship is satisfied by them; for 
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instance, that the sum of their values [where relevant] is a multiple of 7. The same 
rectangle could then be divided into a different set of elements which may or may not 
overlap the first set. Some of the values of the data elements might then be modified to 
satisfy a further relationship. However, where this is the case, and the two sets of 
5 elements within the same rectangle overlap, then the further modification must preserve 
the first relationship. Every element of each small rectangle should preferably form part 
of at least one set of data elements. In a preferred embodiment, every element of each 
small rectangle must form part of at least one set of data elements. This ensures that no 
pixel can escape checking. The process of modification according to a particular rule is 
10 carried out on preferably all the rectangles into which the image is divided. 

More generally, the digital data work may be divided into convenient sets of data 
elements, the members of which may then be modified according to a required 
algorithm. In the case of audio data, for example, the digital signals may be split into 
15 sets of amplitude signal levels of any suitable length. These sets of data elements can 
then be sub-divided into individual data elements, with a subsequent modification to 
those individual data elements as described above. 

It is important that the method of sub-division, i.e. selection of the actual data elements 
20 in a particular set of data elements to be modified, is private, as should the relationships 
to be obeyed by the modified data elements. Commonly, each will be known only to 
the originator of the digital data work, or alternatively some person unconnected in any 
way with the work itself. For example, in the case of photographic equipment designed 
to produce photographs for evidential purposes, e.g. security cameras, it may be only 
25 the manufacturer of the equipment who knows the method of sub-division and the 
relationships obeyed by the modified data elements. Either or both of the method of 
sub-division and the relationships obeyed by the modified data elements may preferably 
be derivable from a key selected by the manufacturer or originator. It will then be 
highly improbable that anyone might by chance modify the digital data work in a way 
30 that results in various data elements satisfying the predetermined relationship. Further, 
the key could also be formed from two different constituent keys, each constituent key 
being held by a different party. Only when both parties co-operate can a validation 
process then be performed. 
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Detailed Descriprion 

The detailed description that follows is in respect of an embodiment of the invention 
that relates to digitised images. 

5 

For the sake of simplicity, it is supposed that the image concerned is rectangular and 
sub-divided into sets of smaller rectangles, ri, r2, r3, ....rn- The only theoretical 
restriction on the sub-divisions is that they should divide the data in a repeatable way, 
as in a tessellation. 

10 

Suppose that each rectangle, ri, contains m members rn, ri2, ri3, rim- 

Each of these members has one or more measurable characteristics. In the case of 
contone images these would usually be numbers between 0 and 255, that is numbers 
requiring 8 bits of binary data to represent them. These numbers correspond roughly to 
15 the intensity of colours in the element. Again, for the sake of simplicity, it will be 
assumed that the only one such number is present for each element and that this 
describes a position on a greyscale, that is an indication of how nearly black or white an 
element is. 

20 Anyone wishing to ensure the integrity of the data would select a key which typically 
could be a number between 0 and 100,000,000. This key would be known only to the 
originator. This key is used to derive an algorithm which is used as described below. 

Firs, a choice is made of the size of the rectangles ri^ r2, r3, ....rn- For a given 

25 rectangle, ri, a selection of the elements si, $2, S3, Sn, is then made 

according to a scheme derived from the key. These elements have corresponding values 
VI, V2, V3, vn. 

An arithmetic relationship is derived from the key according to previously defined 
30 rules. This is a relationship which the values v[ must satisfy. The choice of type of 
relationship is considerable, ranging from primitive requirements, such as the sum of 
elements being even, or the sum of the individual bits of a binary representation of the 
numbers being a given multiple, through to complicated functional relationships. In 
order to satisfy these relationships, some or all of the values of vp must be adjusted. 

35 

Two major considerations influence the choice of relationships. The first is that the 
probability of the data satisfying the relationship without any adjustment must be small. 
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Secondly, the adjustments required must not be large for any element and only a small 
proportion of values should need to be changed. This latter requirement is necessary to 
ensure that the data work suffers no significant loss of information. 

5 A second set of elements tl, t2» t3 tn. is chosen from the members of ri. This 

set, as before, will be required to satisfy a set of relationships, with the same criteria for 
acceptability. This process continues until every member of rj has appeared in at least 
one selected set. The simplest version of this process is, of course, the division of the 
rectangle into two non-intersecting sets. 

10 

The process is now repeated for every one of the n rectangles, so that the values of 
every element of the image have been related to the values of some other elements. 

The method of detection is virtually die inverse process. Detection can only take place if 
15 the originator's key is known, because that information is necessary to generate the 
appropriate sets and relationships. If part of an image has been tampered with in any 
way, the relationships on one or more of the rectangles ri will not be satisfied. If the 
image has been cropped it may be necessary for the detector to spend some time 
locating the correct origins of the rectangles ri, but there should be no doubt when the 
20 correct position has been discovered because, except in the case of gross distortion of 
the image, the relationship will be satisfied on the majority of the rectangles of the set. 

The nature of the method is unchanged if data is sequential, as in the case of audio data 
or video recordings. There will again be the division of data in a repetitive manner and 
25 adjustment to satisfy prescribed relationships. 

Further Dlustrarive Example 

There now follows a further detailed example of the present invention. For simplicity, 
30 the scale of the following example is smaller than would be used in actual situations. 

Stage 1: Choice of Key 

Any user of an image validation scheme in accordance with this invention is provided 
with a key. The key is used to carry out the initial validation of the image and to check 
35 that no subsequent alterations have been made. In this further example, we assume that 
the chosen key, k, must be in the range 1 to 10000. From this key various numbers 
must be deduced for various parts of the validation process. The only real requirement 
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is that, whatever method is used, each key will give its own unique validation process. 
As an example, a number ks, is required to govern the selection of a subset from a 
rectangular array which represents pan of an image. If this number is required in the 
range 0 to 49, an arbitrary prescription might be to permute the digits of the key in a 
5 predefined manner, divide the resulting number by 50 and take the remainder. The set 
of possible prescriptions is vast. 

Stage 2: Division of Rectangles 

The image array is divided into rectangles. In this illustration, these are 4x4 squares, 

10 ri, r2, r3, 

Suppose that ri is the array indicated below: 

5 8 3 1 

4 9 6 6 

15 2 9 7 6 

7 2 7 2 


Stage 3: Selection of sub-sets 
20 The array ri is sub-divided into sets si, s2, S3, S4. The method of subdivision 
consists of the following two steps. 

Step 1: Choice of permutation 

Since there are 4 members to each of the sets s, we consider permutations of 4 objects. 
25 Given a set a,b,c,d, an arbitrarily chosen permutation of the set can be described by the 
notation: 

P = 1234 
3 142 

30 

This is read as: "The element in position 1 moves to position 3: the element in position 
2 moves to position 1: etc." 

In the validation process, we need to derive such a permutation from a number ks, 
35 derived firom the key, k. Each kg must correspond to a unique permutation. As can be 
seen above, the essence of choosing a permutation is to select a rearrangement of the 4 
numbers 1, 2, 3, 4. One simple method of choosing would be to write down and 
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number all possible permutations of the numbers 1 to 4, and use ks to select which 
permutation is required. 

Step 2: Application of Permutation 
5 Using the above permutation, P, and a fairly obvious notation, P (abed) = (bdac), P^ 
is the notation used to indicate that the permutation P has been applied twice. The first 
application to (abed) gives (bdac) as above. The second application follows thus:- 

P2 (abed) = P(P(abcd)) = P(bdac) = (dcba) 
10 Similarly, p3 (abed) = (cadb) and P^ (abed) = (abed) 

Concatenating (abed) and its permutations we obtain the ordered set:- 
(abcd)(bdac)(dcba)(cadb). 

15 Arranging each set of 4 into a 2x2 square we can arrange the permutations to match the 
rectangle ri thus: 

a b b d 
c d a c 

20 d c c a 

b a d b 

We can now take the set si to consist of those elements in positions marked with an 'a' 
and S2 to consist of those sets marked with a 'b'. Thus si is the set in bold, underlined 
25 type in the first rectangle, below left, and S2 the set in bold, underlined type in the 
second rectangle, below right. 



Similarly for S3 and 54. 

35 
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Stage 4: Modifying Sets 

From the key k, a set of numbers ki, k2, k3, k4 must be derived for each of the sets 
si, S2, S3, S4 in order to enable the validation procedure. 

5 Let us first consider the set si and suppose that ki = 3. The validation procedure 
requires an arithmetic relationship between members of si. Suppose that relationship is 
that the sum of the members of si is to be a multiple of ki ( i.e. 3 in this case). 

From the above si = {5,6,6,2} 

10 

Summing, we have 5 + 6 + 6 + 2= 19. 

This is not a multiple of 3. The nearest multiple of 3 is 18. To satisfy the arithmetic 
relationship we must reduce the total by 1. This will be achieved if we reduce the '5' to 
15 a ■4'. 

Then si = {4,6,6,2) and its sum is a multiple of 3 as required. 

As similar process is carried out with the set S2, the sum of the numbers being required 
20 to be a multiple of k2. Likewise with sets S3, S4 ... 

The process is then repeated for each of the rectangles r2, r3,... until the whole image 
has been modified. 

25 In practice the size of rectangle would be much greater giving far greater choice of sets 
s, and hence making trial and error methods of detection impossible. If for instance, a 
set of 8 elements is chosen from 32 elements the number of possible selections is more 
than 10 million. Further, the elements to be diminished or increased may be chosen to 
spread evenly through the rectangle. 

30 

Detection of Modifications to Validated Images 

If it is suspected that an image has been modified, anyone wishing to detect the 
modification must have knowledge of the key, k, which was used in the modification. 

35 The first stage is then to follow the first steps above. That is, the image must be divided 

into rectangles ri, r2, r3, and from these, for each rectangle, selections si, 

S2» S3, S4, must be made, the choice being governed by the key k. 
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For the set si, the value of ki must be derived from k and as in the case above, will 
have the value 3. 

The elements of si will be summed. If no alteration has been made then the sum will 
5 be4 + 6 + 6 + 2=18as above. The fact that it is a multiple of 3 suggest that no 
alteration has been made. 

If on the other hand, the '2' in si had been changed to a '4', the summation above 
would become :4 + 6 + 6 + 4 = 20. Since this is not a multiple of 3 it would be 
10 apparent that a modification of the image had occurred. 

The same process is carried out throughout the image. 

Clearly in this case there is a high risk that an altered image might by chance satisfy the 
15 arithmetic relationship. In real cases the size of rectangle is greater and the values of 
ki, k2, k3 .... will be greater. In a typical image 3 numbers might be required to 
describe a pixel so that even if there were to be a one in twenty chance of satisfying the 
relationship, there would only be a one in eight thousand chance of satisfying the 
relationship for a whole pixel. 

20 


25 
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Claims 

5 1. A method of authenticating a digital data work, to enable any unauthorised 
alteration to that work to be readily detectable, comprising the steps of: 

dividing the whole or part of the work into sets of data elements with each data 
element or set of data elements having a measurable characteristic; 

selecting a particular set of data elements and then modifying a pre-determined 
10 sub-set of data elements in that set so that the measurable characteristic of that pre- 
determined sub-set, or the data elements in that sub-set, satisfies a predetermined 
relation, the measurable characteristic of any sub-set, or the data elements in that sub- 
set, being measurably changed if any data element of that sub-set is altered. 

15 2 . The method of claim 1 comprising the further step of selecting each other set of 
data elements of the work and then modifying a pre-determined sub-set of data elements 
in each such set in the manner as defined in claim 1. 

3. The method of claim 2 wherein the sub-sets of all the sets of data elements of 
20 the work together consist of all the data elements of the work. 

4. The method of claim 1 wherein the sub-set of data elements in each set are 
selected in accordance with an algorithm derivable from a secret key. 

25 5. The method of claim 1 wherein the modification of the data elements in each 
sub-set is accordance with an algorithm derivable from a secret key such that the 
likelihood of the measurable characteristic arising by chance is very low. 

6. A method of detecting any alteration of a digital data work, to which the 
30 authentication method of claims 1 -5 has been applied, comprising the steps of: 

dividing the whole or part of the work into sets of data elements with each data 
element or set of data elements having a measurable characteristic; 

selecting each particular set of data elements and funher selecting a pre- 
deteraiined sub-set of data elements in each set; 
35 determining whether the measurable characteristic of that pre-determined sub- 

set, or the data elements in that sub-set, satisfies a predetermined relation, and issuing a 


wo 97/02522 


PCT/GB96/01515 


signal if the measurable characteristic of any data element or sub-set does not satisfy 
that predetermined relation, that signal being indicative of the work having been altered. 
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